Search CORE

88 research outputs found

Data Mining Using Relational Database Management Systems

Author: Beibei Zou
Bettina Kemme
Doina Precup
Glen Newton
Xuesong Ma
Publication venue
Publication date: 01/01/2006
Field of study

Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time

CogPrints Cognitive Sciences Eprint Archive

Flower-CDN: A hybrid P2P overlay for Efficient Query Processing in CDN

Author: El Dick Manal
Kemme Bettina
Pacitti Esther
Publication venue: HAL CCSD
Publication date: 24/03/2009
Field of study

International audienceMany websites with a large user base, e.g., websites of non-profit organizations, do not have the financial means to install large web-servers or use specialized content distribution networks such as Akamai. For those websites, we have developed Flower-CDN, a locality-aware peer-to-peer based content-distribution network in which the users that are interested in a website support the distribution of its content. The idea is that peers keep the web-pages they retrieve and later serve them to other peers that are close to them in locality. Our architecture is a hybrid between structured and unstructured networks. When a node requests a web-page from a website for the first time, a locality-aware DHT quickly finds a peer in its neighborhood that has the web-page available. Additionally, all peers in a given region that maintain content of a particular website build an unstructured content overlay. Within a content overlay peers gossip information about their content allowing the system to maintain accurate information despite failures and churn. In our detailed performance evaluation, we compare Flower-CDN with Squirrel, which is a content distribution network that is strictly based on DHTs and not locality aware. Compared to Squirrel, Flower-CDN reduces lookup latency by a factor of 9 and the transfer distance by a factor of 2. We also show that Flower-CDN's gossiping has low overhead and can be adjusted according to hit ratio requirements and bandwidth availability

A Highly Robust P2P-CDN Under Large-Scale and Dynamic Participation

Author: El Dick Manal
Kemme Bettina
Pacitti Esther
Publication venue: HAL CCSD
Publication date: 11/10/2009
Field of study

International audienceBy building a P2P Content Distribution Network (CDN), peers collaborate to distribute the content of under-provisioned websites and to serve queries for larger audiences on behalf of the websites. This can reveal very challenging, given the highly dynamic and autonomous participation of peers. Indeed, the P2P-CDN should adapt to increasing numbers of participants and provide robust algorithms under churn because these issues have a key impact on performance. Also, the distribution of tasks and content over peers should take into account their interests in order to give them proper incentives to cooperate. Finally, the routing of queries should aim peers close in locality and serve content from close-by providers to reduce network overload and achieve scalability. We have previously proposed a locality and interest-aware P2P-CDN, Flower-CDN, that lacks efficient management of robustness and scalability. In this paper, we focus on these crucial shortcomings and propose PetalUp-CDN. The performance evaluation with respect to scalability and churn shows highly significant gains

Short paper: Cheat Detection and Prevention in P2P MOGs

Author: Huguenin Kévin
Kemme Bettina
Yahyavi Amir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2011
Field of study

International audienceIn peer-to-peer games, cheaters can easily disrupt the game state computation and dissemination, perform illegal actions and unduly gain access to sensitive information. We propose AntiCheat - a cheat detection and prevention protocol following a mutual verification approach complemented with information exposure mitigation. It is based on a randomized dynamic proxy scheme for both the dissemination and verification of actions and further reduces the information exposed to players close to the minimum required to render the game. We build a proof-of-concept prototype on top of Quake III. Experimentations with up to 48 players show that opportunities to cheat can be significantly reduced, even in the presence of colluding cheaters, while keeping good performance

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Flower-CDN: A hybrid P2P overlay for Efficient Query Processing in CDN

Author: El Dick Manal
Kemme Bettina
Pacitti Esther
Publication venue: HAL CCSD
Publication date: 24/03/2009
Field of study

INRIA a CCSD electronic archive server

A Highly Robust P2P-CDN Under Large-Scale and Dynamic Participation

Author: El Dick Manal
Kemme Bettina
Pacitti Esther
Publication venue: HAL CCSD
Publication date: 11/10/2009
Field of study

Crossref

INRIA a CCSD electronic archive server

P2Prec: a Social-based P2P Recommendation System for Large-scale Data Sharing

Author: Draidi Fady
Kemme Bettina
Pacitti Esther
Valduriez Patrick
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

We propose P2Prec, a P2P recommendation system for large-scale data sharing, which exploits friendship links. The main idea is to recommend high quality contents related to query topics and contents of friends (or friends of friends), who are expert on the topics related to the query. Expertise is implicitly deduced based on the contents stored by a user. To exploit friendship links, we rely on Friend-Of-A-Friend (FOAF) descriptions. To disseminate information about experts, we propose new semantic-based gossip algorithms that provide scalability, robustness, simplicity and load balancing. By using information retrieval techniques, we propose an efficient query routing algorithm that recommends the best peers to serve a query. In our experimental evaluation, using the TREC09 dataset and Wiki vote social network, we show that using semantic gossiping increases recall by a factor of 2.5 compared with well known random gossiping. Furthermore, P2Prec has the ability to get reasonable recall with acceptable query processing load and network traffic

CiteSeerX

INRIA a CCSD electronic archive server

Area-based gossip multicast

Author: Buchmann Alejandro
Kabus Patric
Kemme Bettina
Seeger Christian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

TUbiblio

Crossref

Distributed Data Management in 2020?

Author: Abiteboul Serge
Chin Ooi Beng
Jiménez-Peris Ricardo
Kemme Bettina
Tamer Özsu M.
Valduriez Patrick
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Work on distributed data management commenced shortly after the introduction of the relational model in the mid-1970's. 1970's and 1980's were very active periods for the development of distributed relational database technology, and claims were made that in the following ten years centralized databases will be an “antique curiosity” and most organizations will move toward distributed database managers [1]. That prediction has certainly become true, and all commercial DBMSs today are distributed

Crossref

ScholarBank@NUS

Archivo Digital UPM